Search CORE

7 research outputs found

Sieving in primality testing and factorization

Author: Vatai Emil
Publication venue
Publication date: 01/01/2014
Field of study

ELTE Digital Institutional Repository (EDIT)

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache

Author: Chen Peng
Domke Jens
Drozd Aleksandr
Gerofi Balazs
Kodama Yuetsu
Matsuoka Satoshi
Mittal Sparsh
Pericàs Miquel
Podobas Artur
Vatai Emil
Wahib Mohamed
Zhang Lingqi
Publication venue
Publication date: 05/04/2022
Field of study

Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of LARC, a processor fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a board set of proxy-applications and benchmarks, we aim to reveal where HPC CPU performance could be circa 2028, and conclude an average boost of 9.77x for cache-sensitive HPC applications, on a per-chip basis. Additionally, we exhaustively document our methodological exploration to motivate HPC centers to drive their own technological agenda through enhanced co-design

arXiv.org e-Print Archive

Discrete mathematics I.

Author: Vatai Emil
Publication venue: 'Department of Polymer Engineering, Scientific Society of Mechanical Engineering'
Publication date: 01/01/2016
Field of study

Készült az ELTE Felsőoktatási Struktúraátalakítási Alapból támogatott programja keretében

ELTE Digital Institutional Repository (EDIT)

Cache optimized linear sieve

Author: Járai Antal
Vatai Emil
Publication venue
Publication date: 01/01/2011
Field of study

Sieving is essential in different number theoretical algorithms. Sieving with large primes violates locality of memory access, thus degrading performance. Our suggestion on how to tackle this problem is to use cyclic data structures in combination with in-place bucket-sort. We present our results on the implementation of the sieve of Eratosthenes, using these ideas, which show that this approach is more robust and less affected by slow memory

arXiv.org e-Print Archive

ELTE Digital Institutional Repository (EDIT)

Why globally re-shuffle? Revisiting data shuffling in large scale deep learning

Author: Domke Jens
Drozd Aleksandr
Gerofi Balazs
Liao Jianwei
Nguyen Thao Truong
Trahay François
Vatai Emil
Wahib Mohamed
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/05/2022
Field of study

International audienceStochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural Networks (DNN). SGD iterates the input data set in each training epoch processing data samples in a random access fashion. Because this puts enormous pressure on the I/O subsystem, the most common approach to distributed SGD in HPC environments is to replicate the entire dataset to node local SSDs. However, due to rapidly growing data set sizes this approach has become increasingly infeasible. Surprisingly, the questions of why and to what extent random access is required have not received a lot of attention in the literature from an empirical standpoint. In this paper, we revisit data shuffling in DL workloads to investigate the viability of partitioning the dataset among workers and performing only a partial distributed exchange of samples in each training epoch. Through extensive experiments on up to 2,048 GPUs of ABCI and 4,096 compute nodes of Fugaku, we demonstrate that in practice validation accuracy of global shuffling can be maintained when carefully tuning the partial distributed exchange. We provide a solution implemented in PyTorch that enables users to control the proposed data exchange scheme

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Why globally re-shuffle? Revisiting data shuffling in large scale deep learning

Author: Domke Jens
Drozd Aleksandr
Gerofi Balazs
Liao Jianwei
Nguyen Thao Truong
Trahay François
Vatai Emil
Wahib Mohamed
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/05/2022
Field of study

HAL Descartes